ECQ$$^{\text {x}}$$: Explainability-Driven Quantization for Low-Bit and Sparse DNNs

نویسندگان

چکیده

Abstract The remarkable success of deep neural networks (DNNs) in various applications is accompanied by a significant increase network parameters and arithmetic operations. Such increases memory computational demands make learning prohibitive for resource-constrained hardware platforms such as mobile devices. Recent efforts aim to reduce these overheads, while preserving model performance much possible, include parameter reduction techniques, quantization, lossless compression techniques. In this chapter, we develop describe novel quantization paradigm DNNs: Our method leverages concepts explainable AI (XAI) information theory: Instead assigning weight values based on their distances the clusters, assignment function additionally considers relevances obtained from Layer-wise Relevance Propagation (LRP) content clusters (entropy optimization). ultimate goal preserve most relevant weights highest content. Experimental results show that Entropy-Constrained XAI-adjusted Quantization (ECQ $$^{\text {x}}$$ x ) generates ultra low-precision (2–5 bit) simultaneously sparse maintaining or even improving performance. Due reduced precision high number zero-elements, rendered are highly compressible terms file size, up 103 $$\times $$ × compared full-precision unquantized DNN model. approach was evaluated different types models datasets (including Google Speech Commands, CIFAR-10 Pascal VOC) with previous work.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-mode matrix quantizer for low bit rate LSF quantization

In this paper, we introduce a novel method for quantization of line spectral frequencies (LSF) converted from mth order linear prediction coefficients. In the proposed method, the interframe correlation of LSFs is exploited using matrix quantization where N consecutive frames are quantized as one m-by-N matrix. The voicing-based multi-mode operation reduces the bit rate by taking advantage of t...

متن کامل

Robust vector quantization for low bit rate speech coding

Speech coding systems for mobile communication have to cope with noisy channels. In particular, vector quantization as central data reduction scheme is highly sensitive to transmission errors due to the low redundancy in the encoded data. Here we present three methods for the design of a vector quantizer with enhanced robustness against transmission errors. First the optimization of the index a...

متن کامل

Joint quantization strategies for low bit-rate sinusoidal coding

Transparent speech quality has not been achieved at low bit rates, especially at 2.4 kbps and below, which is an area of interest for military and security applications. In this paper, strategies for low bit rate sinusoidal coding are discussed. Previous work in the literature on using metaframes and performing variable bit allocation according to the metaframe type is extended. An optimum meta...

متن کامل

Double-Bit Quantization for Hashing

Hashing, which tries to learn similarity-preserving binary codes for data representation, has been widely used for efficient nearest neighbor search in massive databases due to its fast query speed and low storage cost. Because it is NP hard to directly compute the best binary codes for a given data set, mainstream hashing methods typically adopt a two-stage strategy. In the first stage, severa...

متن کامل

Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization

Low-bit deep neural networks (DNNs) become critical for embedded applications due to their low storage requirement and computing efficiency. However, they suffer much from the non-negligible accuracy drop. This paper proposes the stochastic quantization (SQ) algorithm for learning accurate low-bit DNNs. The motivation is due to the following observation. Existing training algorithms approximate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2022

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-04083-2_14